Abstract

Diabetes is growing at an epidemic rate in the United States, and what’s true nationwide is also true in North Carolina. It is estimated that Diabetes and prediabetes cost an estimated $10.9 billion in North Carolina each year (American Diabetes Asssociation, 2015). This post introduces the exploration of the Diabetes epidemic in North Carolina. Through a series of posts this project will examine various public data available on diabetes and explore possible solutions to address the rise of diabetes in North Carolina. This investigation stems from the Capstone project of my Health Care Informatics Masters program. This post will answer the following questions:

  1. What is the overall trend of diabetes prevalence in the United States?
  1. What is the trend of diabetes at a State LevelHow does diabetes prevalence vary by state and region?
  1. How do trends in diabetes prevalence vary across counties of North Carolina?
  1. In which counties of North Carolia the most change in diabetes prevalence occur?
  1. How does change in diabetes prevalence compares between rural and urban counties?

Enviroment

This section contains technical information for deeper analysis and reproduction. Casual readers are invited to skip it.

Packages used in this report.

Definitions of global object (file paths, factor levels, object groups ) used throughout the report.

Data

The data for this exploration comes from several sources:

  1. The Diabetes data set for state and county levels were sourced from the US Diabetes Surveillance System; Division of Diabetes Translation - Centers for Disease Control and Prevention. The data was downloaded one year per file, and compiled into a single data set for analysis.

  2. The Diabetes data set for National level data were sourced from the CDC’s National Health Interview Survey (NHIS)

  3. The list of rural counties was taken from The Office of Rural Health Policy, the list is available here

Data Manipulation

The combined data used in this anaylsis can be downloaded here. The only tweaks done here are to combine the rural counties column, and the data for creating maps.

Tweaks

Merge

Overall - National Level

Overall, the national average for diagnosed diabetes sharply rose through the early 2000’s, leveling off around 2010. These numbers however, are estimates based on the response to the CDC’s National Health Interview Survey, and do not represent the actual confirmed diagnoses. The CDC estimates that 1 in 5 adults have undiagnosed diabetes, therefore the numbers reported by the NHIS are likely to underestimate the true prevalence (Centers for Disease Control and Prevention, 2020).

Overall - State Level

State and County level data on diabetes prevalence are taken from the CDC’s Behavioral Risk Factor Surveillance System (BRFSS). These results are based on the question “Has a doctor, nurse, or other health professional ever told you that you have diabetes?”. Women who only experienced diabetes during pregnancy were excluded from the counts. The BRFSS is an ongoing, monthly telephone survey of the non-institutionalized adult (aged 18 years or older) population in each state. The year 2011 saw a major change to the methodology of the survey, which started to include homes without a landline phone. This change was expected to increase coverage of lower income, lower educational levels, and younger age groups, because these groups more often exclusively rely on cellular telephones for personal communication.(Pierannunzi et al., 2012)

The above graph shows diabetes prevalence trends by state, grouped into regions based on the US Census classification regions. While all regions of the United states show positive growth in diabetes prevalence, the south exhibits a slightly higher growth rate, as well as the highest prevalence.

When focusing on the south region, North Carolina falls close to the middle of Diabetes prevalence.

Overall - North Carolina

When examining North Carolina as a whole we can see that NC has been trending much higher than the United States as a whole. We see that in 2016 there was a large spike in diagnosed cases, unfortunately this is the last year of data available to see if this upward trend continues. The data below was complied by taking the average of all county level data in North Carolina. It can be noted that this trend line is slightly higher then in the previous graphs, this is due to the age cut offs used for National and State level data vs County Level data. The previous data used the cut off of 18 years of age for classifying adults, whereas the county level data uses 20 years of age to classify adults. By removing 18 and 19 year olds from the population, who typically have less diagnosed cases of diabetes then those of older ages, the data shifts slight up.

We see a spike in 2016, the last year for which the data are available. However, we should be careful with our interpretation of this pattern, because the examination of the county-level trajectories reveals an aberration in the trend that requires a more rigorous investigation.

While all of North Carolina has a higher prevalence than the national average, rural counties have systematically higher prevalence of diabetes than urban counties. Note that after 2011 both Urban and Rural counties break the upward trend exhibited in the previous 5 years. This could be explained by the addition of cell phones to the BRFS Survey as many rural areas are often lower income areas and may only rely on a cell phone for communication. As mentioned previously there is an odd spike in case in 2016 that can’t be explained by current documentation. For the purpose of this evaluation 2016 will be excluded from the county level data since the odd trend can not be explained and no further data is available to determine if this is a real spike or could be attributed to methodology change or data quality.

By County - Geographical

County level data first became available in 2004, three years of data is used to build these estimates. For example, the 2006 estimates were computed using the date from 2005, 2006, and 2007 BRFS survey rounds. The county-level estimates were based on indirect model-dependent estimates using Bayesian multilevel modeling techniques(Barker et al., 2013; JNK, 2003 ). This model-dependent approach employs a statistical model that “borrows strength” in making an estimate for one county from BRFSS data collected in other counties and states. Multilevel Binomial regression models with random effects of demographic variables (age 20-44, 45-64, >=65; race/ethnicity; sex) at the county-level were developed. Estimates were adjusted for age to the 2000 US standard population using age groups of 20-44, 45-64, and 65 or older(Klein & Schoenborn, 2001).

HOW DO I EXPLAIN THE ABOVE GRAPH?

The following graphs displays the total estimated prevalence of Diabetes in each off the 100 North Carolina counties. To keep the scaling consistent between the graphs, we binned the estimates into 6 intervals of the same size. Rural counties are highlighted with a stronger border line as well as a letter “R” in respective geographic centers. These graphs allow us to view geographical clusters of diabetes prevalence.

We can see from the following histogram, the change in prevalence between 2006 to 2014. In 2006 their were 41 Rural counties in the bottom half of the bins. In 2014 only 29 rural counties remain in the bottom half.

By County - Percent Change

The following graphs display the overall change in estimated prevalence between 2006 to 2014.

rural pos neg no_c
Rural 35 16 3
Urban 34 10 2

By examining the trends, we see that both Urban and Rural saw similar increase in estimated prevalence between 2006 - 2014. While showing an similar increases rural counties suffer a higher overall prevalence of diabetes then there Urban counterparts.

Next Steps

This concludes the first post discussing the Diabetes Epidemic in Rural North Carolina, in future posts we will look at risk factors for diabetes, how population growth and decline could be affecting the trends, as well as looking at various prevention methods for diabetes.

Session information

===========================================================================

For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand.

Environment

- Session info -------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RTerm                       
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/New_York            
 date     2020-06-09                  

- Packages -----------------------------------------------------------------------------------------------------------
 package      * version  date       lib source        
 abind          1.4-5    2016-07-21 [1] CRAN (R 3.6.0)
 acepack        1.4.1    2016-10-29 [1] CRAN (R 3.6.3)
 assertthat     0.2.1    2019-03-21 [1] CRAN (R 3.6.1)
 backports      1.1.5    2019-10-02 [1] CRAN (R 3.6.1)
 base64enc      0.1-3    2015-07-28 [1] CRAN (R 3.6.0)
 bit            1.1-15.1 2020-01-14 [1] CRAN (R 3.6.2)
 bit64          0.9-7    2017-05-08 [1] CRAN (R 3.6.0)
 blob           1.2.1    2020-01-20 [1] CRAN (R 3.6.2)
 broom          0.5.6    2020-04-20 [1] CRAN (R 3.6.3)
 callr          3.4.1    2020-01-24 [1] CRAN (R 3.6.2)
 car            3.0-6    2019-12-23 [1] CRAN (R 3.6.2)
 carData        3.0-3    2019-11-16 [1] CRAN (R 3.6.1)
 cellranger     1.1.0    2016-07-27 [1] CRAN (R 3.6.1)
 checkmate      2.0.0    2020-02-06 [1] CRAN (R 3.6.3)
 chron          2.3-55   2020-02-02 [1] CRAN (R 3.6.3)
 class          7.3-15   2019-01-01 [2] CRAN (R 3.6.2)
 classInt       0.4-3    2020-04-07 [1] CRAN (R 3.6.3)
 cli            2.0.1    2020-01-08 [1] CRAN (R 3.6.2)
 cluster        2.1.0    2019-06-19 [2] CRAN (R 3.6.2)
 colorspace     1.4-1    2019-03-18 [1] CRAN (R 3.6.1)
 corrplot       0.84     2017-10-16 [1] CRAN (R 3.6.3)
 crayon         1.3.4    2017-09-16 [1] CRAN (R 3.6.1)
 curl           4.3      2019-12-02 [1] CRAN (R 3.6.2)
 data.table     1.12.8   2019-12-09 [1] CRAN (R 3.6.2)
 DBI            1.1.0    2019-12-15 [1] CRAN (R 3.6.2)
 desc           1.2.0    2018-05-01 [1] CRAN (R 3.6.2)
 devtools       2.2.1    2019-09-24 [1] CRAN (R 3.6.2)
 digest         0.6.21   2019-09-20 [1] CRAN (R 3.6.1)
 dlookr         0.3.13   2020-01-09 [1] CRAN (R 3.6.3)
 DMwR           0.4.1    2013-08-08 [1] CRAN (R 3.6.3)
 dplyr        * 1.0.0    2020-05-29 [1] CRAN (R 3.6.3)
 e1071          1.7-3    2019-11-26 [1] CRAN (R 3.6.3)
 ellipsis       0.3.0    2019-09-20 [1] CRAN (R 3.6.1)
 evaluate       0.14     2019-05-28 [1] CRAN (R 3.6.1)
 fansi          0.4.1    2020-01-08 [1] CRAN (R 3.6.2)
 farver         2.0.3    2020-01-16 [1] CRAN (R 3.6.2)
 flextable    * 0.5.9    2020-03-06 [1] CRAN (R 3.6.3)
 forcats        0.4.0    2019-02-17 [1] CRAN (R 3.6.1)
 foreign        0.8-75   2020-01-20 [2] CRAN (R 3.6.2)
 Formula        1.2-3    2018-05-03 [1] CRAN (R 3.6.0)
 fs             1.3.1    2019-05-06 [1] CRAN (R 3.6.1)
 gdtools        0.2.2    2020-04-03 [1] CRAN (R 3.6.2)
 generics       0.0.2    2018-11-29 [1] CRAN (R 3.6.1)
 gghighlight    0.2.0    2020-01-25 [1] CRAN (R 3.6.2)
 ggplot2      * 3.3.0    2020-03-05 [1] CRAN (R 3.6.3)
 ggpmisc        0.3.4    2020-04-22 [1] CRAN (R 3.6.3)
 ggrepel        0.8.1    2019-05-07 [1] CRAN (R 3.6.1)
 glue           1.4.1    2020-05-13 [1] CRAN (R 3.6.3)
 gridExtra      2.3      2017-09-09 [1] CRAN (R 3.6.1)
 gsubfn         0.7      2018-03-16 [1] CRAN (R 3.6.3)
 gtable         0.3.0    2019-03-25 [1] CRAN (R 3.6.1)
 haven          2.2.0    2019-11-08 [1] CRAN (R 3.6.2)
 highr          0.8      2019-03-20 [1] CRAN (R 3.6.1)
 Hmisc          4.4-0    2020-03-23 [1] CRAN (R 3.6.3)
 hms            0.5.3    2020-01-08 [1] CRAN (R 3.6.2)
 htmlTable      1.13.3   2019-12-04 [1] CRAN (R 3.6.3)
 htmltools      0.4.0    2019-10-04 [1] CRAN (R 3.6.1)
 htmlwidgets    1.5.1    2019-10-08 [1] CRAN (R 3.6.1)
 httr           1.4.1    2019-08-05 [1] CRAN (R 3.6.2)
 inum           1.0-1    2019-04-25 [1] CRAN (R 3.6.3)
 jpeg           0.1-8.1  2019-10-24 [1] CRAN (R 3.6.1)
 kableExtra     1.1.0    2019-03-16 [1] CRAN (R 3.6.3)
 KernSmooth     2.23-16  2019-10-15 [2] CRAN (R 3.6.2)
 knitr        * 1.28     2020-02-06 [1] CRAN (R 3.6.2)
 labeling       0.3      2014-08-23 [1] CRAN (R 3.6.0)
 lattice        0.20-38  2018-11-04 [2] CRAN (R 3.6.2)
 latticeExtra   0.6-29   2019-12-19 [1] CRAN (R 3.6.3)
 libcoin        1.0-5    2019-08-27 [1] CRAN (R 3.6.3)
 lifecycle      0.2.0    2020-03-06 [1] CRAN (R 3.6.3)
 magrittr     * 1.5      2014-11-22 [1] CRAN (R 3.6.1)
 mapdata      * 2.3.0    2018-03-30 [1] CRAN (R 3.6.2)
 maps         * 3.3.0    2018-04-03 [1] CRAN (R 3.6.2)
 MASS           7.3-51.5 2019-12-20 [2] CRAN (R 3.6.2)
 Matrix         1.2-18   2019-11-27 [2] CRAN (R 3.6.2)
 memoise        1.1.0    2017-04-21 [1] CRAN (R 3.6.2)
 mgcv           1.8-31   2019-11-09 [2] CRAN (R 3.6.2)
 mice           3.9.0    2020-05-14 [1] CRAN (R 3.6.3)
 moments        0.14     2015-01-05 [1] CRAN (R 3.6.0)
 munsell        0.5.0    2018-06-12 [1] CRAN (R 3.6.1)
 mvtnorm        1.1-0    2020-02-24 [1] CRAN (R 3.6.2)
 nlme           3.1-143  2019-12-10 [2] CRAN (R 3.6.2)
 nnet           7.3-12   2016-02-02 [2] CRAN (R 3.6.2)
 nortest        1.0-4    2015-07-30 [1] CRAN (R 3.6.0)
 officer        0.3.8    2020-03-13 [1] CRAN (R 3.6.3)
 openxlsx       4.1.4    2019-12-06 [1] CRAN (R 3.6.2)
 partykit       1.2-7    2020-03-06 [1] CRAN (R 3.6.3)
 pillar         1.4.3    2019-12-20 [1] CRAN (R 3.6.2)
 pkgbuild       1.0.6    2019-10-09 [1] CRAN (R 3.6.2)
 pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 3.6.1)
 pkgload        1.0.2    2018-10-29 [1] CRAN (R 3.6.2)
 png            0.1-7    2013-12-03 [1] CRAN (R 3.6.0)
 polynom        1.4-0    2019-03-22 [1] CRAN (R 3.6.1)
 prettydoc      0.3.1    2019-11-23 [1] CRAN (R 3.6.3)
 prettyunits    1.1.1    2020-01-24 [1] CRAN (R 3.6.2)
 processx       3.4.1    2019-07-18 [1] CRAN (R 3.6.2)
 proto          1.0.0    2016-10-29 [1] CRAN (R 3.6.3)
 ps             1.3.0    2018-12-21 [1] CRAN (R 3.6.1)
 purrr          0.3.4    2020-04-17 [1] CRAN (R 3.6.3)
 quantmod       0.4-15   2019-06-17 [1] CRAN (R 3.6.2)
 R6             2.4.1    2019-11-12 [1] CRAN (R 3.6.2)
 RcmdrMisc      2.7-0    2020-01-14 [1] CRAN (R 3.6.3)
 RColorBrewer   1.1-2    2014-12-07 [1] CRAN (R 3.6.0)
 Rcpp           1.0.2    2019-07-25 [1] CRAN (R 3.6.1)
 readr        * 1.3.1    2018-12-21 [1] CRAN (R 3.6.1)
 readxl         1.3.1    2019-03-13 [1] CRAN (R 3.6.1)
 remotes        2.1.0    2019-06-24 [1] CRAN (R 3.6.2)
 rio            0.5.16   2018-11-26 [1] CRAN (R 3.6.2)
 rlang          0.4.6    2020-05-02 [1] CRAN (R 3.6.2)
 rmarkdown      2.1      2020-01-20 [1] CRAN (R 3.6.2)
 ROCR           1.0-11   2020-05-02 [1] CRAN (R 3.6.3)
 rpart          4.1-15   2019-04-12 [2] CRAN (R 3.6.2)
 rprojroot      1.3-2    2018-01-03 [1] CRAN (R 3.6.2)
 RSQLite        2.2.0    2020-01-07 [1] CRAN (R 3.6.3)
 rstudioapi     0.11     2020-02-07 [1] CRAN (R 3.6.2)
 rvest          0.3.5    2019-11-08 [1] CRAN (R 3.6.2)
 sandwich       2.5-1    2019-04-06 [1] CRAN (R 3.6.3)
 scales         1.1.0    2019-11-18 [1] CRAN (R 3.6.2)
 sessioninfo    1.1.1    2018-11-05 [1] CRAN (R 3.6.2)
 sf           * 0.9-3    2020-05-04 [1] CRAN (R 3.6.3)
 smbinning      0.9      2019-04-01 [1] CRAN (R 3.6.3)
 snakecase      0.11.0   2019-05-25 [1] CRAN (R 3.6.3)
 sqldf          0.4-11   2017-06-28 [1] CRAN (R 3.6.3)
 stringi        1.4.4    2020-01-09 [1] CRAN (R 3.6.2)
 stringr        1.4.0    2019-02-10 [1] CRAN (R 3.6.1)
 survival       3.1-8    2019-12-03 [2] CRAN (R 3.6.2)
 systemfonts    0.1.1    2019-07-01 [1] CRAN (R 3.6.3)
 testthat       2.3.1    2019-12-01 [1] CRAN (R 3.6.2)
 tibble         3.0.1    2020-04-20 [1] CRAN (R 3.6.3)
 tidyr        * 1.0.2    2020-01-24 [1] CRAN (R 3.6.2)
 tidyselect     1.1.0    2020-05-11 [1] CRAN (R 3.6.3)
 tinytex        0.19     2020-01-14 [1] CRAN (R 3.6.2)
 TTR            0.23-6   2019-12-15 [1] CRAN (R 3.6.2)
 units          0.6-6    2020-03-16 [1] CRAN (R 3.6.3)
 usethis        1.5.1    2019-07-04 [1] CRAN (R 3.6.2)
 uuid           0.1-4    2020-02-26 [1] CRAN (R 3.6.3)
 vctrs          0.3.1    2020-06-05 [1] CRAN (R 3.6.3)
 viridisLite    0.3.0    2018-02-01 [1] CRAN (R 3.6.1)
 webshot        0.5.2    2019-11-22 [1] CRAN (R 3.6.3)
 withr          2.1.2    2018-03-15 [1] CRAN (R 3.6.1)
 xfun           0.12     2020-01-13 [1] CRAN (R 3.6.2)
 xml2           1.2.2    2019-08-09 [1] CRAN (R 3.6.1)
 xtable         1.8-4    2019-04-21 [1] CRAN (R 3.6.1)
 xts            0.12-0   2020-01-19 [1] CRAN (R 3.6.2)
 yaml           2.2.0    2018-07-25 [1] CRAN (R 3.6.0)
 zip            2.0.4    2019-09-01 [1] CRAN (R 3.6.2)
 zoo            1.8-7    2020-01-10 [1] CRAN (R 3.6.2)

[1] C:/Users/belangew/Documents/R/win-library/3.6
[2] C:/Program Files/R/R-3.6.2/library

References

American Diabetes Asssociation. (2015). The burden of diabetes in north carolina. http://main.diabetes.org/dorg/PDFs/Advocacy/burden-of-diabetes/north-carolina.pdf

Barker, L. E., Thompson, T. J., Kirtland, K. A., Boyle, J. P., Geiss, L. S., McCauley, M. M., & Albright, A. L. (2013). Bayesian small area estimates of diabetes incidence by united states county, 2009. Journal of Data Science, 11(1), 269–280. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4537395/

Centers for Disease Control and Prevention. (2020). National diabetes statistics report. US Department of Health and Human Services. https://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf

Klein, R. J., & Schoenborn, C. A. (2001). Age adjustment using the 2000 projected u.s. Population. Healthy People 2000 Stat Notes, 20, 1–9.

Pierannunzi, C., Town, M., Garvin, W., Shaw, F. E., & Balluz, L. (2012). Methodologic changes in the behavioral risk factor surveillance system in 2011 and potential effects on prevalence estimates. Morbidity and Mortality Weekly Report, 61(22), 410–413. https://www.cdc.gov/mmwr/pdf/wk/mm6122.pdf